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We discuss the possibility of applying some standard statistical methods (the least square method, 
the maximum likelihood method, the method of statistical moments for estimation of parameters) 
to deterministically chaotic low-dimensional dynamic system (the logistic map) containing an obser- 
vational noise. A "pure" Maximum Likelihood (ML) method is suggested to estimate the structural 
parameter of the logistic map along with the initial value xi considered as an additional unknown 
parameter. Comparisons with previously proposed techniques on simulated numerical examples 
give favorable results (at least, for the investigated combinations of sample size A'^ and noise level). 
Besides, unlike some suggested techniques, our method does not require the a priori knowledge of 
the noise variance. We also clarify the nature of the inherent difficulties in the statistical analysis 
of deterministically chaotic time series and the status of previously proposed Bayesian approaches. 
We note the trade-off between the need of using a large number of data points in the ML analysis to 
decrease the bias (to guarantee consistency of the estimation) and the unstable nature of dynamical 
trajectories with exponentially fast loss of memory of the initial condition. The method of statistical 
moments for the estimation of the parameter of the logistic map is discussed. This method seems 
to be the unique method whose consistency for deterministically chaotic time series is proved so far 
theoretically (not only numerically). 

PACS numbers: 



The problem of characterizing and quantifying a noisy 
nonhnear dynamical chaotic system from a finite realiza- 
tion of a time series of measurements is full of difficulties. 
The first one is that one rarely has the luxury of know- 
ing the underlying dynamics, i.e., one does not in general 
know the underlying equations of evolution. Techniques 
to reconstruct a parametric representation of the time 
series then may lead to so-called model errors. 

Even in the rare situations where one can ascertain 
that the measurements correspond to a known set of 
equations with additive noise, the chaotic nature of the 
dynamics makes the estimation of the model parameters 
from time series surprisingly difhcult. This is true even 
for low-dimensional systems, another even rarer instance 
in naturally occurring time series. 

Here, we revisit the problem proposed by McSharry 
and Smith [1] , who introduced an improved method over 
standard least-square fits to estimate the structural pa- 
rameter of a low-dimensional deterministically chaotic 
system (the logistic map). We discuss the caveats un- 
derlying this problem, propose a "pure" Maximum Like- 
lihood method that we compare with previously proposed 
methods. Our conclusion stresses the inherent difficulties 
in formulating a bona fide statistical theory of structural 
parameter estimations for noisy deterministic chaos. 



I. DEFINITION AND NATURE OF THE 
PROBLEM 

Let us consider the supposedly simple problem consid- 
ered by McSharry and Smith [1], in which one measures 
the sample si, ...,sn with 



(1) 



where the underlying dynamical one-dimensional discrete 
recurrence equation 

Xi+i = F{xi,a) = 1 - ax} 



(2) 

is known and the rjiS are Gaussian N(Q,e) iid random 
variables with zero mean and standard deviation e. The 
problem is to determine the model parameter a from the 
measurements si, sat, knowing that (2) is the true dy- 
namics. 

At first sight, this problem looks like a statistical es- 
timation of an unknown structural parameter, given ob- 
servational data. However, strictly speaking, this prob- 
lem cannot be (even formally) refered to as a bona fide 
statistical problem in which the maximum likelihood 
(ML) method can be proved to be asymptotically opti- 
mal or even consistent. Indeed, the Likelihood Function 
L{a,xi\si, sn) reads 



\n.L{a,xi\si, sn) oc — 7Vln(e)- 
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where F(*)(.ti, a) is the i-th iteration of the logistic map 

(2) with parameter a and initial value xi. The key point 
of difficulty is that f''^ (xi , a) is a non- stationary function 
(despite the fact that the dynamical system (2) has an in- 
variant measure ii{x)). Standard statistical ML methods 
are applicable either to functions not depending on z, or 
depending on i in a periodic manner. For non-stationary 
and non-periodic dependence of the function on i, no sta- 
tistical theorem on optimal properties of MLE is a priori 
applicable. Then, numerical simulations of examples are 
not enough and should be complemented with proofs of 
results stating what known mathematical statistics prop- 
erties of ML or of Bayesian methods continue to apply to 

(3) . A first taste of the difficulty of the problem is given 
by an analysis of the behavior of the "one-step least- 
square (LS) estimation" and of the "total least-square" 
incithod, given in Appendix A. Appendix A shows that 
least-square methods are biased and should be corrected 
before comparing these to other methods, as done in [1] . 
In particular. Appendix A shows that it was a priori un- 
fair or inappropriate to compare any estimate obtained 
with a given method (such as the one advocated by Mc- 
Sharry and Smith [1]) to uncorrected ML-estimates due 
to the non-stationarity of the function; the appropriate 
corrections can be obtained from the standard statistical 
theory of confluence analysis [2-4]. 

II. A "PURE" MAXIMUM LIKELIHOOD 
APPROACH IN TERMS OF {a,xi) 

Putting aside the question of a rigorous demonstration 
of the consistency and asymptotic optimality of the MLE 

method, let us come back to expression (3), which is the 
straightforward translation of the iid Gaussian N{0,e) 
properties of the random variables rji's. It suggests that 
the problem of estimating the structTiral parameter a can- 
not actually be separated from estimating simultaneously 
the initial value xi. 

The MLE of {a,xi) amounts in this case to the mini- 
mization of the sum: 

Y,(s,-F<-'\x,,a)y , (4) 

i 

which looks superficially as a standard non-linear least- 
square sum. There is however one very important dis- 
tinction, as we already pointed out above: the non-linear 
function depends on the index i whereas, in the standard 
least-square method, one has a sum of the type 

J2isi-F{xi,a)f , (5) 

i 

where the xi, ...,xn arc assumed to be known. 

For the parameters a for which the logistic map ex- 
hibits the phenomenon of sensitivity upon the initial con- 
dition, the direct minimization of (4) is not feasible di- 
rectly. Indeed, if we disturb xi by a small number d, then 



the ith iterations F^^'>{xi,a) and F^^^{xi + S,a) diverge 
asymptotically exponentially fast with i: for instance, 
with an accuracy 6 = 10^^"'' and for a = 1.85, the differ- 
ence i^(*)(a;i,a) — F^^\xi + 6, a) becomes of order 1 for 
i > 20. This implies that, in practice, we cannot calcu- 
late with the necessary accuracy x^+i = F^^\xi,a) for 
i > 20. To address this fundamental limitation, we pro- 
pose to cut the sample si, ...,3^ into ni portions of size 
no more than n2 = 20, and to treat each portion sepa- 
rately. This amounts to re-estimating a different initial 
condition for each such sub-series, which is a natural step 
since the sensitivity upon initial conditions amounts to 
losing the information on the specific value of the initial 
condition. 

Our numerical tests show that our MLE works well 
(see below) by considering sub-series of size in the range 
n2 = 4 — 25 (for the true value of a equal to the value 
1.85 considered by by McSharry and Smith [1] that we 
take as our benchmark for the sake of comparison). For 
larger samples (say, N ~ 100), we recommend to cut 
this sample into rii subsamples of size n2 = 4 — 25, and 
treat them separately. It is possible that we lose some 
efficiency in treating subsamples separately, but a joint 
estimation would require the maximization of the likeli- 
hood with the common parameter a and several different 
initial value parameters. This procedure would lead to 
a very difficult numerical multivariate search problem as 
any gradient method would fail due to the very irregular 
structure of the likelihood function (see below and figure 
!)• 

The procedure we propose is thus to cut the initial 
time-series into ni independent subsamples of size n2 
in the range 4 — 25, and to average the resulting m 
a-estimates. In order to determine the optimal value 
of Til for a fixed N (say A'^ = 100) and for the value 
a = 1.85 investigated here, we calculate the standard de- 
viation sdt(a) over the ni subsamples as a function of ni. 
We find that, basically independently of the noise level 
e, the pair ni = 25, n2 = 4 gives the smallest standard 
deviation sdt(a). 

We have implemented this approach and compared it 
with the results obtained by the method proposed by 
McSharry and Smith [1], as discussed in the next section. 



III. ML VERSION OF MCSHARRY AND 
SMITH [1] AND COMPARISONS 

The main result of McSharry and Smith's paper [1] 
consists in their formulae (13,14) for their proposed ML 
cost function. Their idea is to substitute in the ML cost 
function the unknown invariant measure /Xo {x) of the dy- 
namical system (2), for a given value of the parameter a, 
for what should be a realization of the latent variables 
Xi's. Notice that a should be varied in order to deter- 
mine the maximum likelihood. In practice, the integral 
over the unknown invariant measure fJa{x) is replaced 
by a sum over a model trajectory (which can be calcu- 
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lated since the model is assiimed to be known) of length 
T ^ N. Unfortunately, this most important step is not 
confirmed by any numerical results (see below). 

Before continuing, let us note that there is a mis- 
take in the probability density function (pdf) and like- 
lihood given by their equations (7-9). Using the intu- 
ition that pairs {si, s^+i) should be used in their equation 
(5, 6) to track the deterministic relation between Xi and 
Xi+i = F{xi, a), we see that a single latent variable Xi is 
associated with each pair (si,Si+i) since Si is compared 
with Xi and Si+i with F{xi,a). Thus, each Xi is used 
only once when scanning all possible pairs (si,Si+i), for 
i = 1, N— 1 and in their ML cost function (13,14). Ac- 
tually, the correct likelihood should use only once each 
observed random variable Si, not the latent variable .Xj. 
Therefore, using pairs (si,Si+i), McSharry and Smith 
take into account each Si,i = 2, ...,N — 1 twice, and the 
end values si,sn once. For N ^ 2, their expression 
(7) is approximately equals (up to the end terms) to the 
square of the correct likelihood. Taking the logarithm in 
their equation (13) gives approximately twice the correct 
likelihood, which gives almost the same estimate as the 
exact likelihood. 

While this mistake has no serious consequences for the 
numerical accuracy of their calculation for long time se- 
ries -/V 3> 2, it illustrates the difference between their cx)n- 
struction of the likelihood and our direct approach pre- 
sented in the previous section. By writing the conditional 
likelihood for a pair (s^, s^+i) imder a latent variable Xi, 
and by averaging this conditional likelihood weighted by 
the invariant measure iJ,{x\a), McSharry and Smith sug- 
gest that, by doing so, they incorporate additional in- 
formation on the system in question. If we had a usual 
probability space, then such averaging would provide the 
unconditional likelihood of the pair (sj, Sj+i) but, for de- 
terministically chaotic time series, the exact meaning of 
this averaging is not clear. Another questionable step of 
McSharry and Smith is to multiply these pairwise likeli- 
hoods as if the pairs (si, Si+i) were independent. If this 
was so, this would indeed give the unconditional likeli- 
hood for the data sample Si, sjv. 

But, we deal here with "deterministic chaos" which 
generates not truly random variables (see for instance 
[5, 6] for discussions on the pseudo-randomness nature 
of such time series). Besides, we have some more in- 
formation about the structure of the system in ques- 
tion. Namely, we suppose known the generating relation 
(2). This relation contains everything and is, in princi- 
ple, much more informative than the stationary invariant 
measure /x(a;|a) (which is akin to a one-point statistics 
while (2) contains information on all higher-order point 
statistics). Concretely, it is clear that the product of 
pdf's for each pair (s^, Si+i) and the resulting likelihood 
depends solely on the first initial value xi since all sub- 
sequent Xi are deterministically determined recurrently. 
This remark gives the likelihood function (3) in terms of 
two unknown parameters (a.xi) to be estimated. This 
leads indeed to consider the initial state variable xi as an 
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1.885 


0.0467 


1.781 


1.959 


0.178 


0.766 



TABLE I: Comparison between McSharry and Smith's ML 
method [1] and our "pure" ML method described in section 
II over 1000 realizations of the system (2) with true value 
a = 1.85 giving 1000 time series of length N = 100, each 
them decorated with Gaussian noise with two different stan- 
dard deviations (0.5 and 1). qi and 32 are the sample quan- 
tiles at the 2.5% and 97.5% probability level, so that 92 — qi 
gives the width of the 95% confidence intervals. Our "pure" 
ML method provides us with an estimation e of the standard 
deviation of the noise given in the last column. 



unknown parameter to be estimated (along with a) from 
the sample si, sjv- The likelihood (3) provides a more 
detailed form than obtained by averaging over the in- 
variant measure fi{x\a). We can hope that our approach 
would lead to a more efficient estimate of a. McSharry 
and Smith avoid the maximization with respect to Xi in 
their likelihood (13,14) and replace it by an averaging 
over a proxy of the invariant measure. It is doubtful that 
such a step is warranted, not speaking of optimality, in 
view of our numerical tests presented below. 

We now compare our "pure" Maximum Likelihood ap- 
proach in terms of (a, xi) proposed in section II with Mc- 
Sharry and Smith's ML method, using numerical tests. 
We consider 1000 time series with TV = 100 data points 
and subdivide each of them into ni = 25 sub-series of 
n2 = 4 data points. We fix the true a equal to 1.85 as 
in [1] and study noises with standard deviations equal 
to 0.5 and 1.0. Table I shows a significant improvement 
offered by our "pure" ML method over McSharry and 
Smith's average ML, as least for the set of parameters 
studied here. It is not possible to guarantee that this 
will be the case for all possible parameter values but we 
believe our method can not be worse that McSharry and 
Smith's average ML. A difiiculty that should be men- 
tioned is that the chaotic nature of the dynamics and in 
particular the sensitivity of the invariant measure with 
respect to the control parameter a is reflected into an 
ugly- looking log-Likelihood landscape shown in Figure 1 , 
with many competing valleys. Standard numerical meth- 
ods like gradient or simplex are unapplicable. We have 
used a systematic 2D-grid search. Other methods in the 
field of computational intelligence, such as stimulated an- 
nealing and genetic algorithms, coidd also be used. The 
sensitivity of the invariant measure with respect to the 
control parameter a means that the invariant distribution 
can bifurcate from an almost uniform distribution on the 
interval [—a, 1] to a distribution consisting of three delta- 
functions (this happens around a w 1.75). 

In addition to performing better, our "pure" ML ap- 
proach does not depend on the noise level, in contrast 
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Contour lines of Likelihood L(a,X|),N=20,a^_^=l ,85;x^,_^^^=,9: 




1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 



FIG. 1: Contour lines of the "pure" log-Likelihood given by 
expression (3) for a given realization of iV = 20 data points 
generated with a starting value xi — 0.9, a = 1.85 and noise 
std equal to 1. The log-Likelihood landscape is similar to a 
2D Brownian sheet (2D generalization of a random walk) . 

with the ML cost function (13,14) proposed by McSharry 
and Smith [1]. This is an important advantage when the 
true level of noise is not known (noise error). Our method 
is insensitive to such noise error while we have found 
examples where the optimal estimation of the structure 
parameter a with McSharry and Smith's method is ob- 
tained for a value of the noise standard deviation different 
from the true value. In general, the true noise level is not 
known and McSharry and Smith's method does not ap- 
ply in such situation. Our "pure" ML method actually 
provides us with an estimation e of the standard devi- 
ation of the noise given in the last column of Table I. 
These estimates have a small bias down (two fitted pa- 
rameters were taken into account), which may be due to 
the fact that ni is not sufficiently large (ni = 25; n2 = 4; 
iV = m X 712 = 100). 



IV. DISCUSSION OF OTHER APPROACHES 

Meyer and Christensen [7] have proposed to replace the 

ad hoc construction of McSharry and Smith's ML cost 
function by a Bayesian approach, assuming noninforma- 
tive priors for the structural parameter a, for the ini- 
tial value xi and for the standard deviation of the noise. 
Their approach improves significantly on McSharry and 
Smith [1] by recognizing the role of xi but turns out to be 
incorrect, as shown by Judd [8], because their approach 
amounts to assuming a stochastic model, thus refering to 
quite another problem. 

Based on the formulation of [9], Judd [8] develops a 
formulation which is almost identical to our "pure" ML 
(3) but there are important distinctions. Similarly to us, 
Judd introduces xi but he does not employ it. He prefers 



to eliminate the dependency on xi by averaging this pa- 
rameter with a fiducial distribution (see e.g. [4], Chapter 
21, Interval Estimation, Fiducial Intervals). Judd incor- 
rectly calls the method based on his equations (4,5) a ML 
method. In fact, his equations (4,5) gives a a hybrid of 
ML, Bayesian and so-called fiducial methods. It is a ML 
method with respect to the structural parameter a. It is 
Bayesian with respect to the initial value xi . It is fiducial 
since it does not assume any a-priori density for xi, but 
uses a prior density function p[si —w) (using the notation 
above) that is in fact a Gaussian density of the noise with 
mean value equal to the unknown initial value si. Using 
such density is equivalent to weighting a two-parameter 
likelihood by weights corresponding to different values of 
noise disturbances. Thus, the averaged likelihood (5) in 
[8] describes an ensemble of different noise disturbances 
of an unknown initial value si. This provides a (rea- 
sonable but not optimal) method of elimination of the 
second parameter xi from the maximization procedure. 
It is neither a pure Bayesian method (that would assume 
explicitly some a-priori density for si which could be ar- 
bitrary, and not necessarily equal to p(si — w)), nor a 
ML method for two unknown parameters as we suggested 
above in section II. 

In this context in view of the emphasis on Bayesian 
methods to solve this problem [7, 8], it is perhaps use- 
ful to stress that the probability theory rule P{A, B} = 
P{A\B} P{B} is often freely called "the Bayes rule." 
This is why the averaging of likelihoods over conditional 
state variables can be called Bayesian approaches, al- 
though this is not quite correct since the latent (state) 
variables are not random values in the standard meaning 
of this notion (as it is assumed by McSharry and Smith), 
although the state variables have a limit invariant mea- 
sure, as we said above. The Bayesian approach assumes 
that parameters are random values. For instance, Mc- 
Sharry and Smith assume that the latent (state) variables 
are random variables, which is not quite so, although the 
state variables have a limit invariant measure, as we said 
above. We stressed already that the series of state vari- 
ables can be considered as a degenerate set of random 
values that arc determined by one single random vari- 
able, namely Xi. What is more natural? To consider xi 
as a random variable with a distribution determined by 
the invariant mcasm-e, or to consider xi as an unknown 
parameter to be estimated? The answer, in our opinion, 
is dictated by consideration of efficiency: the different 
examples that we have explored suggest that the latter is 
as a rule more efficient (has smaller mean square error), 
at least for some combinations of sample size N and noise 
level. 

As all the above has shown, the major obstacle is the 
loss of information on the initial value Xi by the unstable 
logistic map beyond 10 — 25 time steps. We proposed the 
simple recipe of cutting the time series in short pieces 
and of averaging the estimations. Judd proposes a shad- 
owing method [8]. It is not obvious that this will result 
in a consistent estimation and that this will overcome the 
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intrinsic difficulty in treating long realizations (which is 
a necessary condition for unbiased estimations). 

In sum, there is no analytical proof of consistency for 
all the estimation methods discussed until now (including 
the suggestions performed by the most convincing work 
to date [8] and our "pure" ML) . It is useful to analyze the 
only method to our knowledge for which one can derive 
a proof of consistency in the present context, that is, the 
method of statistical moments. 



V. THE METHOD OF STATISTICAL 
MOMENTS 

The method of statistical moments provides a consis- 
tent estimate of the parameters for non-linear maps with 
ergodic properties. The method of statistical moments 
is the unique theoretically proven consistent estimator 
among all methods suggested so far by other authors. 
Although the moment estimates are known to have little 
efficiency, they are consistent! Consistency of all esti- 
mates suggested earlier including ours above were con- 
firmed only nimierically, which is very dangerous for in- 
stable non-linear maps. 

We consider four moment of the observed time se- 
ries: {s)j\j, {s'^)n, {s^)n and {siSi+i)N, where the brack- 
ets stand for time averaging over some time interval A^. 
Building on the knowledge that the series {xt} is ergodic 
[10] and using (1,2), we obtain the following relations 



}n 
)n 

{SiSi+i)N 



/oc : 



(3:^)00 + 3 (x)c 



Besides, averaging equation (2), we get 
(a;)oo = 1 - a(a;^)oo • 



(6) 
(7) 
(8) 
(9) 



(10) 



This provides us with five limit relations (6-10) with five 
unknown parameters: a, (x)^, {x'^)ooj {^^)ca and e. Solv- 
ing these five relations with respect to the unknown pa- 
rameters, we get the so-called estimates of the method of 
moments: 



(11) 



a = 


{SiSi+i)N + 2{s)n 




{x) 00 


{s)n , 


(2^ )oo 




\ = 


T i{s)N - {SiSi+i) 


= 






3{s)n 



(12) 
(13) 

(14) 
(15) 



Because of the limit relations (6-9) (which are valid be- 
cause of the ergodicity of the time series {xi} [10]), the 
estimates (11-15) are consistent if — > c». 
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2.000 


2.000 


1000 


0.5 


1.6907 lb 0.3496 


0.903 


2.000 


1.097 


10000 


0.5 


1.8244 lb 0.1659 


1.467 


2.000 


0.533 


100000 


0.5 


1.8554 lb 0.0741 


1.715 


2.000 


0.285 



TABLE II: Estimation of the structural paxameter a by the 
method of statistical moments (expression (11)) for the lo- 
gistic map Xi+i = 1 — aXi,a = 1.85; the observations axe 
Si = Xi + r/i; rji is a Gaussian random variable A''(0, e). As in 
tabic I, gi and 52 arc the sample quantilcs at the 2.5% and 
97.5% probability level, so that 52 — qi gives the width of the 
95% confidence intervals. Each estimate for a and std are 
based on 1000 simulated samples. 



We present in table II the estimates of the parameter a 
given by expression (11). The consistency of the method 
of statistical moments is clearly suggested by the numer- 
ical results, as seen from the bracketing of the true value 
by (o)± std and by q\ and 92- However, as we already 
pointed out, the method of statistical moments is rather 
inefficient: the ratio of its standard deviation for a to that 
of the "pure" ML is about 4 for TV = 100 and e = 0.1 for 
instance. 



VI. CONCLUDING REMARKS 

We have proposed a "pure" Maximum Likelihood (ML) 
method to estimate the structural parameter of a deter- 
ministically chaotic low- dimensional system (the logistic 
map), which adds the initial value x\ to the structural 
parameter to be determined. We have compared quan- 
titatively this method with the ML method proposed by 
McSharry and Smith [1] based on an averaging over the 
unknown invariant measure of the dynamical system. A 
key aspect of the implementation of our approach lies in 
the compromise between the need to use a large number 
of data points for the ML to become consistent and the 
unstable nature of dynamical trajectories which loses ex- 
ponentially fast the memory of the initial condition. This 
second aspect prevents using our "pure" ML for systems 
larger than 10 — 25 data points. For larger time series, 
we have found convenient to dcvidc them into subsys- 
tems of very small lengths and then to average over their 
estimations. Numerical tests suggest that this direct ML 
method provides often significantly better estimates than 
previously proposed approaches. 
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The difference between McSharry and Smith's aver- 
aging over the invariant measure and our "pure" ML is 
reminiscent of the distinction between "annealed" versus 
"quenched" averaging in the statistical physics of ran- 
dom systems, such as spin glasses [11, 12]. It has indeed 
been shown that the correct theory of strongly hetero- 
geneous media is obtained by performing the thermal 
Gibbs-Boltzmann averaging over fixed structural disor- 
der realizations, similarly to our use of a specific trajec- 
tory of the latent variables Xj's. In constrast, performing 
the thermal Gibbs-Boltzmann averaging together with 
an averaging over different realization of the structural 
disorder describes another type of physics, which is not 
that of fixed heterogeneity. This second incorrect type of 
averaging is similar to the averaging of the ML over the 
invariant measure performed by McSharry and Smith. 

There are several ways to improve our approach. One 
simple implementation is to use overlapping running win- 
dows. Another method is to re-estimate the realized 
trajectory by using the extended Kalman filter method 
(however, difiiculties may arise due to the existence of a 
maximum in the logistic map). Using shadowing meth- 
ods as proposed in [8] in our context would also be inter- 
esting to investigate. 

Let us end with a cautionary note. As we just said, 
the ML approach for two parameters (a, xi) that we sug- 
gest here evidently works only for a limited sample size 
N (perhaps, < 25 or so) due to the sensitivity upon 
initial conditions of the chaotic logistic map. As is well- 
known in classical statistics, ML-estimates have a bias 
that can be considerable if N is not large (say, N < 100 
or so). The ML-estimates are usually only asymptoti- 
cally unbiased. Thus, for N = 25 (and all the more for 
A = 4), ML-estimates can exhibit a considerable bias. 
Thus, averaging biased estimates as we proposed many 
not result in a consistent estimation. Therefore, we can- 
not assert that our ML method (as well as any other 
suggested methods) is consistent. We can only observe, 
for particular combinations of the considered parame- 
ters, the numerically determined mean square error of 
our suggested estimates with respect to the true parame- 
ter value. We are pleased if these errors are not too high, 
although our estimates can be biased (though, with small 
bias). But we are not able to make such bias arbitrarily 
small by increasing the sample size A. due to the insta- 
bility under the iterations of the logistic map which leads 
to a loss of information about the initial value xi . Thus, 
the situation is rather hopeless for the establishment of a 
meaningful statistical theory of estimation using the con- 
tinuous theory of classical statistics to such discontinuous 
objects as the invariant measures of chaotic dynamical 
systems. 
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Appendix A: One-step and total least-square 
estimations 

McSharry and Smith noticed that the one-step leasts- 
square method gives strongly biased results for the esti- 
mation of a [1]. Indeed, the method of estimation of the 
parameter a by the one-step least square method is evi- 
dently inconsistent, since the deviations (of the random 
variables) to be minimized in a least-square sense are 

Si+i~F{si,a) = Xi+i + ri^+i - F{xi + rit,a) 

= r]i+i + 2axirii + arjf , (16) 

which has non-zero expectation equal to ae^ . But, the 
fundamental least-square principle consists in the min- 
imization of deviations with zero mean. There are no 
least-square schemes that would suggest to minimize ran- 
dom deviations with non-zero mean depending on an un- 
known parameter. Thus, it is not reasonable to include 
the least-square method in any reasonable comparison. 

The method called by McSharry and Smith as "to- 
tal least-squares" (TLS) is applied in situation when the 
variables known only with some errors rjj. This 

situation is called in statistics a "Confiuence analysis," 
or "Estimation of a structural relation between two (or 
more) variables in the presence of errors on both vari- 
ables" [2-4]. In such a situation of confluence analysis, 
since the XiS are in fact unknown (nuisance) parameters 
whose number grows with sample size, there is no guar- 
antee of consistency of the ML estimates of the structural 
parameter a. 

As an example, let us consider the very simple conflu- 
ent scheme: 

Yi = + (17) 
Zi = Xi + Q. (18) 

Suppose we observe a sample of A^ pairs {Yi,Zi),i = 
1, A^, where Xi are unknown arbitrary values and rji, Q 
are iid Gaussian random variables with standard devia- 
tion e. The problem consists in estimating the parameter 
e. Similarly to the situation with (1) and (2) studied in 
[1], no restrictions are placed on the Xj's. The likelihood 
L{e,Xi,...,XNm,Z,),i = l,...,N) is 

i(e, Ai, AjvKi;, Z,), i = 1, AT) oc 



-27V 

e exp 



N 



N 



-{l/2e') ^(1- - A,)2 - {l/2e') ^(Z, - A, 



i=l 



(19) 

The MLE Xi's of the Xj's (that coincide in this case with 
the least-square estimates) are: 



Xi = 



Yi+Zi 



(20) 
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Inserting (20) into (19), we get 
L{e\{Yi,Zi), i = l,...,N)oc 



— 2N 

e exp 



N 



-(l/4e^)^(y,-Z,)^ 



i=l 



(21) 



Thus, the MLE of the parameter e obtained from (21) 
satisfies 



(22) 



Since E [(F, - Zj)^] = 2e^, the estimate (22) is inconsis- 
tent. A consistent ("corrected") estimate is 



(23) 



Thus, wc sec that the MLE of the structural parameter e 
is inconsistent due to the increasing number of nuisance 
parameters. Thus, the direct use of the least-square (or 
total least-square) in the confluent situation is not jus- 
tified, and was not recommended in any statistical text- 
book. Instead, standard statistical works recommend a 
"corrected" ML estimates (see for instance [3, 4]). 



We should stress in addition that there is a signifi- 
cant difference between the standard confiuent analysis 
and the problem addressed in [1]. Confiuent analysis 
deals with arbitrary unknown (distorted) arguments Xi, 
whereas in [1], the latent variables Xi are related by the 
non-linear map (2). The information on the structure of 
the XiS is not used in Confluence Analysis while it can 
really help in the estimation procedure as shown in [1] 
and in the present work. 
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